67 research outputs found
Convex-constrained Sparse Additive Modeling and Its Extensions
Sparse additive modeling is a class of effective methods for performing
high-dimensional nonparametric regression. In this work we show how shape
constraints such as convexity/concavity and their extensions, can be integrated
into additive models. The proposed sparse difference of convex additive models
(SDCAM) can estimate most continuous functions without any a priori smoothness
assumption. Motivated by a characterization of difference of convex functions,
our method incorporates a natural regularization functional to avoid
overfitting and to reduce model complexity. Computationally, we develop an
efficient backfitting algorithm with linear per-iteration complexity.
Experiments on both synthetic and real data verify that our method is
competitive against state-of-the-art sparse additive models, with improved
performance in most scenarios.Comment: 17 pages, 2 figure
Additive Approximations in High Dimensional Nonparametric Regression via the SALSA
High dimensional nonparametric regression is an inherently difficult problem
with known lower bounds depending exponentially in dimension. A popular
strategy to alleviate this curse of dimensionality has been to use additive
models of \emph{first order}, which model the regression function as a sum of
independent functions on each dimension. Though useful in controlling the
variance of the estimate, such models are often too restrictive in practical
settings. Between non-additive models which often have large variance and first
order additive models which have large bias, there has been little work to
exploit the trade-off in the middle via additive models of intermediate order.
In this work, we propose SALSA, which bridges this gap by allowing interactions
between variables, but controls model capacity by limiting the order of
interactions. SALSA minimises the residual sum of squares with squared RKHS
norm penalties. Algorithmically, it can be viewed as Kernel Ridge Regression
with an additive kernel. When the regression function is additive, the excess
risk is only polynomial in dimension. Using the Girard-Newton formulae, we
efficiently sum over a combinatorial number of terms in the additive expansion.
Via a comparison on real datasets, we show that our method is competitive
against other alternatives.Comment: International Conference on Machine Learning (ICML) 201
Generalized Conditional Gradient for Sparse Estimation
Structured sparsity is an important modeling tool that expands the
applicability of convex formulations for data analysis, however it also creates
significant challenges for efficient algorithm design. In this paper we
investigate the generalized conditional gradient (GCG) algorithm for solving
structured sparse optimization problems---demonstrating that, with some
enhancements, it can provide a more efficient alternative to current state of
the art approaches. After providing a comprehensive overview of the convergence
properties of GCG, we develop efficient methods for evaluating polar operators,
a subroutine that is required in each GCG iteration. In particular, we show how
the polar operator can be efficiently evaluated in two important scenarios:
dictionary learning and structured sparse estimation. A further improvement is
achieved by interleaving GCG with fixed-rank local subspace optimization. A
series of experiments on matrix completion, multi-class classification,
multi-view dictionary learning and overlapping group lasso shows that the
proposed method can significantly reduce the training cost of current
alternatives.Comment: 67 pages, 20 figure
Provably noise-robust, regularised -means clustering
We consider the problem of clustering in the presence of noise. That is, when
on top of cluster structure, the data also contains a subset of
\emph{unstructured} points. Our goal is to detect the clusters despite the
presence of many unstructured points. Any algorithm that achieves this goal is
noise-robust. We consider a regularisation method which converts any
center-based clustering objective into a noise-robust one. We focus on the
-means objective and we prove that the regularised version of -means is
NP-Hard even for . We consider two algorithms based on the convex (sdp and
lp) relaxation of the regularised objective and prove robustness guarantees for
both.
The sdp and lp relaxation of the standard (non-regularised) -means
objective has been previously studied by [ABC+15]. Under the stochastic ball
model of the data they show that the sdp-based algorithm recovers the
underlying structure as long as the balls are separated by . We improve upon this result in two ways. First, we show recovery
even for . Second, our regularised algorithm recovers
the balls even in the presence of noise so long as the number of noisy points
is not too large. We complement our theoretical analysis with simulations and
analyse the effect of various parameters like regularization constant,
noise-level etc. on the performance of our algorithm. In the presence of noise,
our algorithm performs better than -means++ on MNIST
Sum-of-Squares Polynomial Flow
Triangular map is a recent construct in probability theory that allows one to
transform any source probability density function to any target density
function. Based on triangular maps, we propose a general framework for
high-dimensional density estimation, by specifying one-dimensional
transformations (equivalently conditional densities) and appropriate
conditioner networks. This framework (a) reveals the commonalities and
differences of existing autoregressive and flow based methods, (b) allows a
unified understanding of the limitations and representation power of these
recent approaches and, (c) motivates us to uncover a new Sum-of-Squares (SOS)
flow that is interpretable, universal, and easy to train. We perform several
synthetic experiments on various density geometries to demonstrate the benefits
(and short-comings) of such transformations. SOS flows achieve competitive
results in simulations and several real-world datasets.Comment: 13 pages, ICML'201
Convergence of Gradient Methods on Bilinear Zero-Sum Games
Min-max formulations have attracted great attention in the ML community due
to the rise of deep generative models and adversarial methods, while
understanding the dynamics of gradient algorithms for solving such formulations
has remained a grand challenge. As a first step, we restrict to bilinear
zero-sum games and give a systematic analysis of popular gradient updates, for
both simultaneous and alternating versions. We provide exact conditions for
their convergence and find the optimal parameter setup and convergence rates.
In particular, our results offer formal evidence that alternating updates
converge "better" than simultaneous ones
Regularizers versus Losses for Nonlinear Dimensionality Reduction: A Factored View with New Convex Relaxations
We demonstrate that almost all non-parametric dimensionality reduction
methods can be expressed by a simple procedure: regularized loss minimization
plus singular value truncation. By distinguishing the role of the loss and
regularizer in such a process, we recover a factored perspective that reveals
some gaps in the current literature. Beyond identifying a useful new loss for
manifold unfolding, a key contribution is to derive new convex regularizers
that combine distance maximization with rank reduction. These regularizers can
be applied to any loss.Comment: Appears in Proceedings of the 29th International Conference on
Machine Learning (ICML 2012
Distributional Reinforcement Learning for Efficient Exploration
In distributional reinforcement learning (RL), the estimated distribution of
value function models both the parametric and intrinsic uncertainties. We
propose a novel and efficient exploration method for deep RL that has two
components. The first is a decaying schedule to suppress the intrinsic
uncertainty. The second is an exploration bonus calculated from the upper
quantiles of the learned distribution. In Atari 2600 games, our method
outperforms QR-DQN in 12 out of 14 hard games (achieving 483 \% average gain
across 49 games in cumulative rewards over QR-DQN with a big win in Venture).
We also compared our algorithm with QR-DQN in a challenging 3D driving
simulator (CARLA). Results show that our algorithm achieves near-optimal safety
rewards twice faster than QRDQN
Indiscriminate Data Poisoning Attacks on Neural Networks
Data poisoning attacks, in which a malicious adversary aims to influence a
model by injecting "poisoned" data into the training process, have attracted
significant recent attention. In this work, we take a closer look at existing
poisoning attacks and connect them with old and new algorithms for solving
sequential Stackelberg games. By choosing an appropriate loss function for the
attacker and optimizing with algorithms that exploit second-order information,
we design poisoning attacks that are effective on neural networks. We present
efficient implementations that exploit modern auto-differentiation packages and
allow simultaneous and coordinated generation of tens of thousands of poisoned
points, in contrast to existing methods that generate poisoned points one by
one. We further perform extensive experiments that empirically explore the
effect of data poisoning attacks on deep neural networks
Robust Multiple Kernel k-means Clustering using Min-Max Optimization
Multiple kernel learning is a type of multiview learning that combines
different data modalities by capturing view-specific patterns using kernels.
Although supervised multiple kernel learning has been extensively studied,
until recently, only a few unsupervised approaches have been proposed. In the
meanwhile, adversarial learning has recently received much attention. Many
works have been proposed to defend against adversarial examples. However,
little is known about the effect of adversarial perturbation in the context of
multiview learning, and even less in the unsupervised case. In this study, we
show that adversarial features added to a view can make the existing approaches
with the min-max formulation in multiple kernel clustering yield unfavorable
clusters. To address this problem and inspired by recent works in adversarial
learning, we propose a multiple kernel clustering method with the min-max
framework that aims to be robust to such adversarial perturbation. We evaluate
the robustness of our method on simulation data under different types of
adversarial perturbations and show that it outperforms several compared
existing methods. In the real data analysis, We demonstrate the utility of our
method on a real-world problem.Comment: R package is available at https://github.com/SeojinBang/MKK
- …